{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Recommender System"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this example we are creating a movie recommender system by extractung feature vectors using PaddlePaddle, importing movie vectors into Milvus, and then searching in Milvus and Redis."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Data\r\n",
    "In this project, we use [MovieLens 1M](https://grouplens.org/datasets/movielens/1m/). This dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users. \r\n",
    "\r\n",
    "We use the following files:\r\n",
    "- movies.dat: Contains movie information.\r\n",
    "- movie_vectors.txt: Contains movie vectors that can be imported to Milvus easily.\r\n",
    "\r\n",
    "File structure:\r\n",
    "\r\n",
    " - movMovieID::Title::Genres   \r\n",
    "\r\n",
    "     - Titles are identical to titles provided by the IMDB (includingyear of release)\r\n",
    " \r\n",
    "     - Genres are pipe-separated\r\n",
    "\r\n",
    "     - Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries\r\n",
    " \r\n",
    "    - Movies are mostly entered by hand, so errors and inconsistencies may exist\r\n",
    "\r\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Requirements"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Due to package constraints, this notebook needs to be run using Python 3.6/3.7 . It is recommended that you use a virtual enviroment like Conda, instructions for installing Conda can be found [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html). \n",
    "\n",
    "Currently, there is a dirty workaround that you can use for python 3.8. When installing the `requirements.txt`, pip will fail to install`sentencepiece`. If you rerun the notebook after the install fails and avoid redownloading the packages, the rest of the notebook should run without any hiccups.\n",
    "\n",
    "|  Packages |  Servers |\n",
    "| --------------- | -------------- |\n",
    "| pymilvus==2.0.0rc5 | milvus-2.0.0-rc5 |\n",
    "| redis           | redis          |\n",
    "| paddle_serving_app |\n",
    "| paddlepaddle==2.1.1 |\n",
    "\n",
    "\n",
    "We have included a requirements.txt file in order to easily satisfy the required packages. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Up and Running"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Installing Packages\n",
    "Install the required python packages with `requirements.txt`. If using Python 3.8, look at workaround in the Requirements section. Uninstall previous pymilvus-orm if needed."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "! pip install -r requirements.txt"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Requirement already satisfied: pymilvus==2.0.0rc5 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from -r requirements.txt (line 1)) (2.0.0rc5)\n",
      "Collecting paddle_serving_app\n",
      "  Downloading paddle_serving_app-0.6.3-py3-none-any.whl (52 kB)\n",
      "\u001b[K     |████████████████████████████████| 52 kB 176 kB/s eta 0:00:01\n",
      "\u001b[?25hCollecting paddlepaddle==2.1.1\n",
      "  Using cached paddlepaddle-2.1.1-cp38-cp38-macosx_10_14_x86_64.whl (72.2 MB)\n",
      "Requirement already satisfied: redis in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (3.5.3)\n",
      "Collecting astor\n",
      "  Using cached astor-0.8.1-py2.py3-none-any.whl (27 kB)\n",
      "Requirement already satisfied: protobuf>=3.1.0 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from paddlepaddle==2.1.1->-r requirements.txt (line 3)) (3.17.3)\n",
      "Collecting decorator==4.4.2\n",
      "  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)\n",
      "Requirement already satisfied: six in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from paddlepaddle==2.1.1->-r requirements.txt (line 3)) (1.15.0)\n",
      "Requirement already satisfied: numpy>=1.13 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from paddlepaddle==2.1.1->-r requirements.txt (line 3)) (1.20.1)\n",
      "Collecting gast>=0.3.3\n",
      "  Using cached gast-0.5.2-py3-none-any.whl (10 kB)\n",
      "Requirement already satisfied: Pillow in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from paddlepaddle==2.1.1->-r requirements.txt (line 3)) (8.2.0)\n",
      "Requirement already satisfied: requests>=2.20.0 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from paddlepaddle==2.1.1->-r requirements.txt (line 3)) (2.25.1)\n",
      "Requirement already satisfied: grpcio==1.37.1 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (1.37.1)\n",
      "Requirement already satisfied: grpcio-tools==1.37.1 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (1.37.1)\n",
      "Requirement already satisfied: pandas==1.2.4 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (1.2.4)\n",
      "Requirement already satisfied: mmh3 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (3.0.0)\n",
      "Requirement already satisfied: ujson>=2.0.0 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (4.0.2)\n",
      "Requirement already satisfied: setuptools in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from grpcio-tools==1.37.1->pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (52.0.0.post20210125)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pandas==1.2.4->pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (2.8.1)\n",
      "Requirement already satisfied: pytz>=2017.3 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from pandas==1.2.4->pymilvus==2.0.0rc5->-r requirements.txt (line 1)) (2021.1)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from requests>=2.20.0->paddlepaddle==2.1.1->-r requirements.txt (line 3)) (1.26.4)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from requests>=2.20.0->paddlepaddle==2.1.1->-r requirements.txt (line 3)) (2.10)\n",
      "Requirement already satisfied: chardet<5,>=3.0.2 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from requests>=2.20.0->paddlepaddle==2.1.1->-r requirements.txt (line 3)) (4.0.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages (from requests>=2.20.0->paddlepaddle==2.1.1->-r requirements.txt (line 3)) (2020.12.5)\n",
      "Collecting pyclipper\n",
      "  Using cached pyclipper-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl (132 kB)\n",
      "Collecting shapely\n",
      "  Using cached Shapely-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl (1.0 MB)\n",
      "Collecting opencv-python<=4.2.0.32\n",
      "  Using cached opencv_python-4.2.0.32-cp38-cp38-macosx_10_9_x86_64.whl (47.9 MB)\n",
      "Collecting sentencepiece<=0.1.92\n",
      "  Using cached sentencepiece-0.1.91-cp38-cp38-macosx_10_6_x86_64.whl (1.0 MB)\n",
      "Installing collected packages: shapely, sentencepiece, pyclipper, opencv-python, gast, decorator, astor, paddlepaddle, paddle-serving-app\n",
      "  Attempting uninstall: decorator\n",
      "    Found existing installation: decorator 5.0.6\n",
      "    Uninstalling decorator-5.0.6:\n",
      "      Successfully uninstalled decorator-5.0.6\n",
      "Successfully installed astor-0.8.1 decorator-4.4.2 gast-0.5.2 opencv-python-4.2.0.32 paddle-serving-app-0.6.3 paddlepaddle-2.1.1 pyclipper-1.3.0 sentencepiece-0.1.91 shapely-1.7.1\n"
     ]
    }
   ],
   "metadata": {
    "jupyter": {
     "outputs_hidden": true
    },
    "tags": []
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Starting Milvus Server\n",
    "\n",
    "This demo uses Milvus 2.0 Standalone with docker-compose, please refer to [Install Milvus 2.0](https://milvus.io/docs/v2.0.0/install_standalone-docker.md) for other installation options (on Kubernetes or use Milvus Cluster)."
   ],
   "metadata": {
    "tags": []
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "!wget https://github.com/milvus-io/milvus/releases/download/v2.0.0-rc5-hotfix1/milvus-standalone-docker-compose.yml -O docker-compose.yml\n",
    "!docker-compose up -d"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--2021-09-02 16:44:35--  https://github.com/milvus-io/milvus/releases/download/v2.0.0-rc5-hotfix1/milvus-standalone-docker-compose.yml\n",
      "Resolving github.com (github.com)... 10.16.78.56\n",
      "Connecting to github.com (github.com)|10.16.78.56|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://github-releases.githubusercontent.com/208728772/3ea455ae-d2d4-4f6d-9041-b35b940074e3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210902T084437Z&X-Amz-Expires=300&X-Amz-Signature=47effc10799cf583f017c1b50c9a45cdaf063eb0b8072c5a25d3bc63785b50e1&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=208728772&response-content-disposition=attachment%3B%20filename%3Dmilvus-standalone-docker-compose.yml&response-content-type=application%2Foctet-stream [following]\n",
      "--2021-09-02 16:44:37--  https://github-releases.githubusercontent.com/208728772/3ea455ae-d2d4-4f6d-9041-b35b940074e3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210902T084437Z&X-Amz-Expires=300&X-Amz-Signature=47effc10799cf583f017c1b50c9a45cdaf063eb0b8072c5a25d3bc63785b50e1&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=208728772&response-content-disposition=attachment%3B%20filename%3Dmilvus-standalone-docker-compose.yml&response-content-type=application%2Foctet-stream\n",
      "Resolving github-releases.githubusercontent.com (github-releases.githubusercontent.com)... 10.16.83.147\n",
      "Connecting to github-releases.githubusercontent.com (github-releases.githubusercontent.com)|10.16.83.147|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1184 (1.2K) [application/octet-stream]\n",
      "Saving to: ‘docker-compose.yml’\n",
      "\n",
      "docker-compose.yml  100%[===================>]   1.16K  --.-KB/s    in 0s      \n",
      "\n",
      "2021-09-02 16:44:38 (19.5 MB/s) - ‘docker-compose.yml’ saved [1184/1184]\n",
      "\n",
      "\u001b[1A\u001b[1B\u001b[0G\u001b[?25l[+] Running 0/0\n",
      "\u001b[37m ⠋ Container milvus-minio  Creating                                        0.1s\n",
      "\u001b[0m\u001b[37m ⠋ Container milvus-etcd   Creating                                        0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/2\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.2s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.2s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.3s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.3s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.4s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.4s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.5s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.5s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.6s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.6s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
      "\u001b[37m ⠿ Container milvus-minio       Starting                                   0.7s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.3s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.4s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.5s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.6s\n",
      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
      "\u001b[34m ⠿ Container milvus-minio       Started                                    0.7s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.8s\n",
      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Started                                    1.7s\n",
      "\u001b[0m\u001b[?25h"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Starting Redis Server\n",
    "We are using Redis as a metadata storage service. Code can easily be modified to use a python dictionary, but that usually does not work in any use case outside of quick examples. We need a metadata storage service in order to be able to be able to map between embeddings and the corresponding data."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "!docker run  --name redis -d -p 6379:6379 redis"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "6a52249b26d22fb51b7f90bfbb427f3931b2b9ab97e55ca06334a188fc09149c\r\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Confirm Running Servers"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "! docker ps"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS                    PORTS                                           NAMES\n",
      "6a52249b26d2   redis          \"docker-entrypoint.s…\"   39 seconds ago   Up 38 seconds             0.0.0.0:6379->6379/tcp, :::6379->6379/tcp       redis\n",
      "67f15c1687a6   d4f48c432200   \"/tini -- milvus run…\"   48 seconds ago   Up 46 seconds             0.0.0.0:19530->19530/tcp, :::19530->19530/tcp   milvus-standalone\n",
      "4a58d7d3bf34   a7908fd5fb88   \"etcd -advertise-cli…\"   48 seconds ago   Up 47 seconds             2379-2380/tcp                                   milvus-etcd\n",
      "b1faeb73971e   b849e0c26c4c   \"/usr/bin/docker-ent…\"   48 seconds ago   Up 47 seconds (healthy)   9000/tcp                                        milvus-minio\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Downloading Pretrained Models\n",
    "This PaddlePaddle model is used to transform user information into vectors."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "!wget https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz --no-check-certificate\n",
    "!mkdir movie_recommender\n",
    "!mkdir movie_recommender/user_vector_model\n",
    "!tar xf user_vector.tar.gz -C movie_recommender/user_vector_model/\n",
    "!rm user_vector.tar.gz"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--2021-09-02 16:45:40--  https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz\n",
      "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 220.181.33.44, 220.181.33.43\n",
      "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|220.181.33.44|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 33650924 (32M) [application/x-gzip]\n",
      "Saving to: ‘user_vector.tar.gz’\n",
      "\n",
      "user_vector.tar.gz  100%[===================>]  32.09M  3.53MB/s    in 10s     \n",
      "\n",
      "2021-09-02 16:45:50 (3.16 MB/s) - ‘user_vector.tar.gz’ saved [33650924/33650924]\n",
      "\n"
     ]
    }
   ],
   "metadata": {
    "scrolled": true
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Downloading Data"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "# Download movie information\n",
    "!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movies.dat --no-check-certificate\n",
    "# Download movie vecotrs\n",
    "!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt --no-check-certificate"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--2021-09-02 16:46:01--  https://paddlerec.bj.bcebos.com/aistudio/movies.dat\n",
      "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 220.181.33.43, 220.181.33.44\n",
      "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|220.181.33.43|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 171308 (167K) [application/octet-stream]\n",
      "Saving to: ‘movie_recommender/movies.dat’\n",
      "\n",
      "movies.dat          100%[===================>] 167.29K   546KB/s    in 0.3s    \n",
      "\n",
      "2021-09-02 16:46:01 (546 KB/s) - ‘movie_recommender/movies.dat’ saved [171308/171308]\n",
      "\n",
      "--2021-09-02 16:46:01--  https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt\n",
      "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 220.181.33.44, 220.181.33.43\n",
      "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|220.181.33.44|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1095505 (1.0M) [text/plain]\n",
      "Saving to: ‘movie_recommender/movie_vectors.txt’\n",
      "\n",
      "movie_vectors.txt   100%[===================>]   1.04M  1.05MB/s    in 1.0s    \n",
      "\n",
      "2021-09-02 16:46:03 (1.05 MB/s) - ‘movie_recommender/movie_vectors.txt’ saved [1095505/1095505]\n",
      "\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Code Overview\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Importing Movies into Milvus"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 1. Connectings to Milvus and Redis\n",
    "Both servers are running as Docker containers on the localhost with their corresponding default ports."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "from pymilvus import *\n",
    "import redis\n",
    "\n",
    "connections.connect()\n",
    "# connections.connect(\"default\", host=\"localhost\", port=\"19530\")\n",
    "r = redis.StrictRedis(host=\"localhost\", port=6379) "
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 2. Loading Movies into Redis\n",
    "We begin by loading all the movie files into redis. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "import json\n",
    "import codecs\n",
    "\n",
    "#1::Toy Story (1995)::Animation|Children's|Comedy\n",
    "def process_movie(lines, redis_cli):\n",
    "    for line in lines:\n",
    "        if len(line.strip()) == 0:\n",
    "            continue\n",
    "        tmp = line.strip().split(\"::\")\n",
    "        movie_id = tmp[0]\n",
    "        title = tmp[1]\n",
    "        genre_group = tmp[2]\n",
    "        tmp = genre_group.strip().split(\"|\")\n",
    "        genre = tmp\n",
    "        movie_info = {\"movie_id\" : movie_id,\n",
    "                \"title\" : title,\n",
    "                \"genre\" : genre\n",
    "                }\n",
    "        redis_cli.set(\"{}##movie_info\".format(movie_id), json.dumps(movie_info))\n",
    "        \n",
    "with codecs.open(\"movie_recommender/movies.dat\", \"r\",encoding='utf-8',errors='ignore') as f:\n",
    "        lines = f.readlines()\n",
    "        process_movie(lines, r)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 3. Creating Partition and Collection in Milvus"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "source": [
    "COLLECTION_NAME = 'demo_films'\r\n",
    "PARTITION_NAME = 'Movie'\r\n",
    "\r\n",
    "pk = FieldSchema(name='pk', dtype=DataType.INT64, is_primary=True, auto_id=False)\r\n",
    "field = FieldSchema(name='vec', dtype=DataType.FLOAT_VECTOR, dim=32)\r\n",
    "schema = CollectionSchema(fields=[pk, field], description=\"movie recommender: demo films\")\r\n",
    "\r\n",
    "if utility.get_connection().has_collection(COLLECTION_NAME): # drop the same collection created before\r\n",
    "    collection = Collection(COLLECTION_NAME)\r\n",
    "    collection.drop()\r\n",
    "else:\r\n",
    "    collection = Collection(name=COLLECTION_NAME, schema=schema)\r\n",
    "    partition = collection.create_partition(PARTITION_NAME)\r\n",
    "    print(\"Collection & partition are successfully created.\")"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Collection & partition are successfully created.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 4. Getting Embeddings and IDs\n",
    "The vectors in `movie_vectors.txt` are obtained from the `user_vector_model` downloaded above. So we can directly get the vectors and the IDs by reading the file."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "def get_vectors():\n",
    "    with codecs.open(\"movie_recommender/movie_vectors.txt\", \"r\", encoding='utf-8', errors='ignore') as f:\n",
    "        lines = f.readlines()\n",
    "    ids = [int(line.split(\":\")[0]) for line in lines]\n",
    "    embeddings = []\n",
    "    for line in lines:\n",
    "        line = line.strip().split(\":\")[1][1:-1]\n",
    "        str_nums = line.split(\",\")\n",
    "        emb = [float(x) for x in str_nums]\n",
    "        embeddings.append(emb)\n",
    "    return ids, embeddings\n",
    "\n",
    "ids, embeddings = get_vectors()"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 4. Importing Vectors into Milvus\n",
    "Import vectors into the partition **Movie** under the collection **demo_films**."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "source": [
    "if collection.num_entities != 0:\n",
    "    print(COLLECTION_NAME + \" is not empty!\")  \n",
    "else:\n",
    "    mr = collection.insert(data=[ids,embeddings], partition_name=PARTITION_NAME)\n",
    "\n",
    "print(\"Record count in collection: \" + str(collection.num_entities))\n",
    "# print(str(len(mr.primary_keys)) + \" ids:\\n\", mr.primary_keys[:10])"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Record count in collection: 3883\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Build Index"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "source": [
    "# Flush collection with inserted vectors to disk\n",
    "# utility.get_connection().flush([COLLECTION_NAME])\n",
    "\n",
    "index_param = {\n",
    "    \"metric_type\": \"L2\",\n",
    "    \"index_type\":\"IVF_FLAT\",\n",
    "    \"params\":{\"nlist\":128}\n",
    "}\n",
    "\n",
    "collection.create_index(field_name=\"vec\", index_params=index_param)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Status(code=0, message='')"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Recalling Vectors in Milvus\n",
    "#### 1. Genarating User Embeddings\n",
    "Pass in the gender, age and occupation of the user we want to recommend. **user_vector_model** model will generate the corresponding user vector.\n",
    "Occupation is chosen from the following choices:\n",
    "*  0:  \"other\" or not specified\n",
    "*  1:  \"academic/educator\"\n",
    "*  2:  \"artist\"\n",
    "*  3:  \"clerical/admin\"\n",
    "*  4:  \"college/grad student\"\n",
    "*  5:  \"customer service\"\n",
    "*  6:  \"doctor/health care\"\n",
    "*  7:  \"executive/managerial\"\n",
    "*  8:  \"farmer\"\n",
    "*  9:  \"homemaker\"\n",
    "*  10:  \"K-12 student\"\n",
    "*  11:  \"lawyer\"\n",
    "*  12:  \"programmer\"\n",
    "*  13:  \"retired\"\n",
    "*  14:  \"sales/marketing\"\n",
    "*  15:  \"scientist\"\n",
    "*  16:  \"self-employed\"\n",
    "*  17:  \"technician/engineer\"\n",
    "*  18:  \"tradesman/craftsman\"\n",
    "*  19:  \"unemployed\"\n",
    "*  20:  \"writer\""
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "source": [
    "import numpy as np\n",
    "from paddle_serving_app.local_predict import LocalPredictor\n",
    "\n",
    "class RecallServerServicer(object):\n",
    "    def __init__(self):\n",
    "        self.uv_client = LocalPredictor()\n",
    "        self.uv_client.load_model_config(\"movie_recommender/user_vector_model/serving_server_dir\") \n",
    "        \n",
    "    def hash2(self, a):\n",
    "        return hash(a) % 1000000\n",
    "\n",
    "    def get_user_vector(self):\n",
    "        dic = {\"userid\": [], \"gender\": [], \"age\": [], \"occupation\": []}\n",
    "        lod = [0]\n",
    "        dic[\"userid\"].append(self.hash2('0'))\n",
    "        dic[\"gender\"].append(self.hash2('M'))\n",
    "        dic[\"age\"].append(self.hash2('23'))\n",
    "        dic[\"occupation\"].append(self.hash2('6'))\n",
    "        lod.append(1)\n",
    "\n",
    "        dic[\"userid.lod\"] = lod\n",
    "        dic[\"gender.lod\"] = lod\n",
    "        dic[\"age.lod\"] = lod\n",
    "        dic[\"occupation.lod\"] = lod\n",
    "        for key in dic:\n",
    "            dic[key] = np.array(dic[key]).astype(np.int64).reshape(len(dic[key]),1)\n",
    "        fetch_map = self.uv_client.predict(feed=dic, fetch=[\"save_infer_model/scale_0.tmp_1\"], batch=True)\n",
    "        return fetch_map[\"save_infer_model/scale_0.tmp_1\"].tolist()[0]\n",
    "\n",
    "recall = RecallServerServicer()\n",
    "user_vector = recall.get_user_vector()"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "2021-09-02 16:47:40,921 - INFO - LocalPredictor load_model_config params: model_path:movie_recommender/user_vector_model/serving_server_dir, use_gpu:False, gpu_id:0, use_profile:False, thread_num:1, mem_optim:True, ir_optim:False, use_trt:False, use_lite:False, use_xpu:False, precision:fp32, use_calib:False, use_mkldnn:False, mkldnn_cache_capacity:0, mkldnn_op_list:None, mkldnn_bf16_op_list:None, use_feed_fetch_ops:False, \n"
     ]
    }
   ],
   "metadata": {
    "tags": []
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 2. Searching\n",
    "Pass in the user vector, and then recall vectors in the previously imported data collection and partition."
   ],
   "metadata": {
    "tags": []
   }
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "source": [
    "collection.load() # load collection memory before search\n",
    "\n",
    "topK = 20\n",
    "SEARCH_PARAM = {\n",
    "    \"metric_type\":\"L2\",\n",
    "    \"params\":{\"nprobe\": 20},\n",
    "    }\n",
    "results = collection.search([user_vector],\"vec\",param=SEARCH_PARAM, limit=topK, expr=None, output_fields=None)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "/Users/mengjiagu/opt/anaconda3/lib/python3.8/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
      "  and should_run_async(code)\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 3. Returning Information by IDs"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "source": [
    "I = []\n",
    "for x in results:\n",
    "    for y in x.ids:\n",
    "        I.append(y)\n",
    "        \n",
    "recall_results = []\n",
    "for x in I:\n",
    "    recall_results.append(r.get(\"{}##movie_info\".format(x)).decode('utf-8'))\n",
    "recall_results"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "['{\"movie_id\": \"760\", \"title\": \"Stalingrad (1993)\", \"genre\": [\"War\"]}',\n",
       " '{\"movie_id\": \"1350\", \"title\": \"Omen, The (1976)\", \"genre\": [\"Horror\"]}',\n",
       " '{\"movie_id\": \"1258\", \"title\": \"Shining, The (1980)\", \"genre\": [\"Horror\"]}',\n",
       " '{\"movie_id\": \"632\", \"title\": \"Land and Freedom (Tierra y libertad) (1995)\", \"genre\": [\"War\"]}',\n",
       " '{\"movie_id\": \"3007\", \"title\": \"American Movie (1999)\", \"genre\": [\"Documentary\"]}',\n",
       " '{\"movie_id\": \"2086\", \"title\": \"One Magic Christmas (1985)\", \"genre\": [\"Drama\", \"Fantasy\"]}',\n",
       " '{\"movie_id\": \"1051\", \"title\": \"Trees Lounge (1996)\", \"genre\": [\"Drama\"]}',\n",
       " '{\"movie_id\": \"3920\", \"title\": \"Faraway, So Close (In Weiter Ferne, So Nah!) (1993)\", \"genre\": [\"Drama\", \"Fantasy\"]}',\n",
       " '{\"movie_id\": \"1303\", \"title\": \"Man Who Would Be King, The (1975)\", \"genre\": [\"Adventure\"]}',\n",
       " '{\"movie_id\": \"652\", \"title\": \"301, 302 (1995)\", \"genre\": [\"Mystery\"]}',\n",
       " '{\"movie_id\": \"1605\", \"title\": \"Excess Baggage (1997)\", \"genre\": [\"Adventure\", \"Romance\"]}',\n",
       " '{\"movie_id\": \"1275\", \"title\": \"Highlander (1986)\", \"genre\": [\"Action\", \"Adventure\"]}',\n",
       " '{\"movie_id\": \"1126\", \"title\": \"Drop Dead Fred (1991)\", \"genre\": [\"Comedy\", \"Fantasy\"]}',\n",
       " '{\"movie_id\": \"792\", \"title\": \"Hungarian Fairy Tale, A (1987)\", \"genre\": [\"Fantasy\"]}',\n",
       " '{\"movie_id\": \"2228\", \"title\": \"Mountain Eagle, The (1926)\", \"genre\": [\"Drama\"]}',\n",
       " '{\"movie_id\": \"2659\", \"title\": \"It Came from Hollywood (1982)\", \"genre\": [\"Comedy\", \"Documentary\"]}',\n",
       " '{\"movie_id\": \"2545\", \"title\": \"Relax... It\\'s Just Sex (1998)\", \"genre\": [\"Comedy\"]}',\n",
       " '{\"movie_id\": \"1289\", \"title\": \"Koyaanisqatsi (1983)\", \"genre\": [\"Documentary\", \"War\"]}',\n",
       " '{\"movie_id\": \"2537\", \"title\": \"Beyond the Poseidon Adventure (1979)\", \"genre\": [\"Adventure\"]}',\n",
       " '{\"movie_id\": \"2864\", \"title\": \"Splendor (1999)\", \"genre\": [\"Comedy\"]}']"
      ]
     },
     "metadata": {},
     "execution_count": 16
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Conclusion"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "After completing the recall service, the results can be further sorted using the **movie_recommender** model, and then the movies with high similarity scores can be recommended to users. You can try this deployable recommender system using this [quick start](QUICK_START.md)."
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}